Focal Thyroid Incidentalomas on 18F-FDG PET/CT: A Systematic Review and Meta-Analysis on Prevalence, Risk of Malignancy and Inconclusive Fine Needle Aspiration

Background The rising demand for 18F-fluorodeoxyglucose positron emission tomography with computed tomography (18F-FDG PET/CT) has led to an increase of thyroid incidentalomas. Current guidelines are restricted in giving options to tailor diagnostics and to suit the individual patient. Objectives We aimed at exploring the extent of potential overdiagnostics by performing a systematic review and meta-analysis of the literature on the prevalence, the risk of malignancy (ROM) and the risk of inconclusive FNAC (ROIF) of focal thyroid incidentalomas (FTI) on 18F-FDG PET/CT. Data Sources A literature search in MEDLINE, Embase and Web of Science was performed to identify relevant studies. Study Selection Studies providing information on the prevalence and/or ROM of FTI on 18F-FDG PET/CT in patients with no prior history of thyroid disease were selected by two authors independently. Sixty-one studies met the inclusion criteria. Data Analysis A random effects meta-analysis on prevalence, ROM and ROIF with 95% confidence intervals (CIs) was performed. Heterogeneity and publication bias were tested. Risk of bias was assessed using the quality assessment of diagnostic accuracy studies (QUADAS-2) tool. Data Synthesis Fifty studies were suitable for prevalence analysis. In total, 12,943 FTI were identified in 640,616 patients. The pooled prevalence was 2.22% (95% CI = 1.90% - 2.54%, I2 = 99%). 5151 FTI had cyto- or histopathology results available. The pooled ROM was 30.8% (95% CI = 28.1% - 33.4%, I2 = 57%). 1308 (83%) of malignant nodules were papillary thyroid carcinoma (PTC). The pooled ROIF was 20.8% (95% CI = 13.7% - 27.9%, I2 = 92%). Limitations The main limitations were the low to moderate methodological quality of the studies and the moderate to high heterogeneity of the results. Conclusion FTI are a common finding on 18F-FDG PET/CTs. Nodules are malignant in approximately one third of the cases, with the majority being PTC. Cytology results are non-diagnostic or indeterminate in one fifth of FNACs. These findings reveal the potential risk of overdiagnostics of FTI and emphasize that the workup of FTI should be performed within the context of the patient’s disease and that guidelines should adopt this patient tailored approach.


INTRODUCTION
18 F-fluorodeoxyglucose ( 18 F-FDG) positron emission tomography (PET) with computed tomography (CT) has become an important diagnostic tool in the assessment of malignancies and inflammatory diseases (1,2). It is estimated that 2.2 million PET/CT scans were performed in the USA in 2019, with an estimated growth of 6% per year since 2013 (3). Due to this rise in imaging demand, incidentalomas are being discovered more often. Incidentalomas are incidentally found lesions unrelated to the clinical indication for 18 F-FDG PET/CT (4). The incidence of 18 F-FDG incidentalomas increases with age, which makes a further increase in incidence and financial impact likely due to population demographics change (5). 18 F-FDG is a glucose analog that accumulates in metabolically active tissue like malignant tumors (6). Therefore, incidentalomas discovered on 18 F-FDG PET/CT have a relatively high risk of malignancy (ROM) compared to incidentalomas detected by other imaging modalities (e.g. ultrasound). The overall prevalence of incidentalomas on whole body 18 F-FDG PET/CT is 2.5% in patients with or without known or suspected cancer (4). Malignant lesions are most commonly found in the gastrointestinal tract, thyroid and lung (6).
Thyroid incidentalomas can be classified as either focal or diffuse. Diffuse 18 F-FDG uptake in the thyroid is often caused by inflammatory disease, like (autoimmune) thyroiditis or Graves' disease (7,8). In contrast, focal 18 F-FDG uptake is more likely caused by benign thyroid disease or malignancy, i.e. adenoma, thyroid carcinoma, metastasis of another origin or lymphoma. The most recent meta-analyses till 2014 showed focal thyroid incidentaloma (FTI) malignancy risks ranging from 34.6 to 37 percent (8)(9)(10)(11).
Guidelines of the American Thyroid Association (ATA), American College of Radiology (ACR), European Thyroid Association (ETA) and British Thyroid Association (BTA) recommend ultrasound (US) guided fine needle aspiration cytology (FNAC) for patients with focal increased uptake in the thyroid gland as detected by 18 F-FDG PET/CT (12)(13)(14). The guidelines are well-delineated and easy to adhere to, but seem to provoke a reflexive or habitual process that propel patients from incidental discovery of a thyroid nodule to FNAC and even surgery (15). Ultimately, this approach might contribute to a cascade effect of overdiagnostics and overtreatment, affecting the quality of life of these patients. Because the recommendations are strongly based on non-randomized retrospective studies, they are restricted in giving options and modifications to tailor diagnostics and to suit the individual patient with his or her specific characteristics and concerns.
Non-diagnostic or indeterminate results on cytopathology are assessed as undesirable yields of the diagnostic chain, resulting in repeat examinations and anxiety and uncertainty in patients. At the same time, doctors and patients seem to be indifferent or unaware of the impact of this potential hazard. Therefore, different from previous systematic reviews and meta-analyses, we looked beyond the prevalence and the ROM of FTI and also analyzed the risk of inconclusive FNAC (ROIF).
We aimed at exploring the extent of potential overdiagnostics by performing a systematic review and meta-analysis of the literature on the prevalence, ROM and ROIF of focal thyroid incidentalomas (FTI) on 18 F-FDG PET/CT, thereby revealing opportunities to improve FTI management.

Literature Search
A systematic literature search was conducted using MEDLINE, Embase and Web of Science to identify relevant articles. Database keywords and text words were searched using thyroid neoplasms, PET and incidental findings including the subcategories and variants of these words as search terms. Similar terms were used for Embase and Web of Science (Supplemental Table 1). The search was restricted to articles published between January 2010 and June 2020, to provide an update to existing meta-analyses. Articles without an English abstract and conference abstracts were excluded. If insufficient data were reported, the authors were contacted to provide additional information. To expand our search, references of retrieved systematic reviews and meta-analyses were screened for additional studies.
The complete search yielded 1156 articles and is displayed in accordance to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) in Figure 1 (16).

Study Selection
Retrospective and prospective cohort studies providing information on the prevalence of FTI on 18 F-FDG PET/CT and/or ROM of 18 F-FDG-avid FTI in patients with no prior history of thyroid disease were considered for inclusion. After duplicates were eliminated, studies were screened for eligibility based on title, abstract, and subsequently on full text by two authors (J.F.d.L., H.E.W.) independently. Disagreements on article inclusion were resolved by consensus reading by the same reviewers. Studies were excluded if: (a) only thyroid incidentalomas with diffuse uptake patterns were investigated or the results of focal and diffuse uptake patterns were not described separately. If both focal and diffuse thyroid incidentalomas were included in a study and described separately, the FTI were considered for further analysis. (b) they concerned a retrospective analysis of surgically treated FTI. (c) they concerned duplicate publications. If so, the study with the largest patient population was included. (d) the full article was written in a non-English language. Finally, 61 studies were included for analysis. The following data were collected for meta-analysis: the total number of 18 F-FDG PET/CTs, the total number of FTI, the number of malignant and benign FTI and the number of FTI with a non-diagnostic or indeterminate cytopathology result after initial FNAC. Specific information regarding the pathological classification or description (based on either cytopathology and/or histopathology) of malignant FTI was collected as well. Hürthle cell and follicular carcinomas were considered as one group. Some studies had patients with multiple FTI. These FTI were considered as separate cases.

Data Extraction
For analysis of ROM, FTI were classified according to cytopathology and histopathology. When both results were available, the histopathology result was used. The definition of a malignant cytopathology result was a description of "malignant" or "suspicious for malignancy" or, according to the Bethesda (B) classification, BV and BVI (17). Some studies used the British THY system as FNAC classification system. THY4 and BV were considered equally as well as THY5 and BVI, as described by the ETA (18). FTI with non-diagnostic/unsatisfactory (BI/THY1), atypia/follicular lesion of undetermined significance (AUS/FLUS) (BIII/THY3a) or (suspicious for) follicular neoplasm (BIV/THY3b) cytopathology were not defined as malignant or benign, unless histopathology or repeat cytopathology was done and decided otherwise (i.e. BII/BV/BVI and THY2/THY4/THY5). FTI were considered benign when they had benign cytopathology (BII/ THY2) or histopathology. FTI classified according to ultrasound, scintigraphy or clinical follow-up were not considered for further analysis.
For analysis of ROIF, the number of FTI with initial BIII/ indeterminate and BI/non-diagnostic cytopathology results were registered separately, independent from repeat FNAC or histopathology results used for ROM analysis.

Quality Assessment
The quality of included studies was assessed using the quality assessment of diagnostic accuracy studies, QUADAS-2 (19). All studies were independently assessed by two reviewers (J.F.d.L., H.E.W.) and disagreements were resolved with consensus reading. QUADAS-2 divides the risk of bias of study methodology into four domains: patient selection, index test, reference test and flow and timing. Studies were considered to have a high risk of bias in the patient selection domain when a non-consecutive or non-random sample was used or inappropriate exclusion criteria were used like "patients with inconclusive cytology". Studies were considered to have a high risk of bias in the index test domain when the 18 F-FDG PET/CTs were interpreted or adjusted with knowledge of the FNAC results. Studies were only classified as high risk in the reference test domain when studies did not use the Bethesda classification to report FNAC results. The patient flow domain was classified as high risk when less than 50% of included patients received FNAC and/or surgery or FNAC was only performed after suspicious US. Domains were considered to be of unclear risk when insufficient information was given to assess methodological quality properly.
QUADAS-2 was used to assess applicability as well. In all studies the patient selection, index test and reference standard met the inclusion criteria and the question of the review.

Statistical Analysis
Prevalence, ROM and ROIF of FTI were calculated using the data extracted from included studies. Regarding ROM, a proportion was calculated using the number of FTI investigated with FNAC and/or surgery as denominator and the number of FTI with malignant cytopathology or histopathology as nominator. ROIF was calculated using the number of initial FNAC as denominator and the number of non-diagnostic and indeterminate FNAC as nominator.
Data were pooled using a random effects model generated by the Cochrane Collaboration software, Review Manager (RevMan) 5.4 software. Heterogeneity was tested using the Chi-square test (p < 0.01) and Higgins and Thompson test to calculate the I 2 statistic (20). As this demonstrated a heterogeneous study set, a random effects model was utilized to calculate pooled estimates. Publication bias was assessed using a funnel plot and weighed Egger's regression test (21).
A forest plot was generated displaying the individual study prevalence, malignancy risk and percentage of indeterminate and non-diagnostic FNAC results with 95% Confidence Intervals (CIs) and the pooled estimates using the forestplot package in the R environment.
Subgroup analyses were done to identify sources of bias and heterogeneity in the data. Methodological quality and study characteristics (age, indication for PET, geography) were used to divide studies into subgroups. With regard to the latter, some parts of the world, i.e. South Korea, have a higher and faster increasing incidence of thyroid cancer, than other parts. Although this increase has mostly been attributed to overscreening and higher rates of diagnosis, a 'true' difference in incidence due to geographic variation in individual factors like obesity or genotype and environmental factors like iodine supplementation or radiation exposure also plays a role (22).

Study Characteristics
Sixty-one studies were included in final analysis. The study characteristics are shown in Table 1. Most studies had a    Table 2. Not all studies were suitable for both prevalence and malignancy risk analysis. Therefore, the data presented in the meta-analyses do not match the total number of 18 F-FDG PET/CTs or the total number FTI.

Quality Assessment
The methodological quality of the included studies is summarized in Supplemental Table 2.

Publication Bias
Publication bias was assessed using the malignancy risks reported in the included studies ( Figure 2). An Egger's regression showed no significant (t = 0.65, p = 0.52) funnel plot asymmetry.
Finally, subgroup analysis using studies that were classified as "low risk" in the patient selection domain (studies with a consecutive design and appropriate exclusion criteria) (N = 34) versus studies that were classified as "high risk" (N = 16) did not result in significantly different pooled prevalences. The prevalence in the "low risk" subgroup was 1.98% (1.70% -2.25%) and the prevalence was 2.57% (1.80% -3.33%) in the "high risk" subgroup.

Malignancy Risk
A total of 5151 FTI in 59 studies had cyto-or histopathology results available. One of two excluded studies did not provide sufficient information to calculate the ROM (65), the other did   (40). Of the 5151 included FTI, 1714 FTI were malignant. The pooled ROM was 30.8% (95% CI = 28.1% -33.4%, I 2 = 57%) ( Table 4). Of the 1714 malignant nodules, 1584 had a final pathological description available (based on either cytopathology or histopathology). The remaining 130 nodules were described as "malignant", but not specified. Of these 1584 FTI with a pathological description available, 1462 (92%) were of thyroidal origin and 1308 (83%) were papillary thyroid cancer (PTC).
Finally, a subgroup analysis based on QUADAS-2 was performed. Patient selection, reference test and flow and timing were tested independently with "low risk" and "high risk" as subgroups. No significant difference in pooled ROM could be demonstrated.
Two of the 15 included studies did not use the Bethesda classification to report FNAC results (26,44). They reported inconclusive results as either "non-diagnostic" or "indeterminate".

DISCUSSION
The present systematic review and meta-analysis shows a pooled prevalence of 18 F-FDG-avid focal thyroid incidentalomas (FTI) of 2.2%. Malignancy is found in about one third of the FTI, the vast majority being papillary thyroid cancer (PTC). Nondiagnostic or indeterminate FNAC results are seen in approximately 21% of FNACs, meaning diagnostic uncertainty and new decision making.
This study can be considered as an update with inclusion of studies published in the last 10 years, using newer generations of PET/CT scanners. A major distinction from previous reviews is that we analyzed the risk of inconclusive FNAC (ROIF) with the purpose of estimating the encountered difficulties of the diagnostic chain. Both the risk of malignancy (ROM) and ROIF are key findings of our analyses and illustrate the necessity of tailoring the diagnostics of FTI to suit the preferences and context of the individual patient. The found ROIF (21%) is comparable to the ROIF found in a general population with thyroid nodules (23%) (84). Our findings concerning prevalence and ROM of FTI on 18 F-FDG-PET/CT are similar to those in previous meta-analyses, which found FTI prevalences varying between 1.6% and 2.5% and ROM of 35-37% (8)(9)(10)(11).
The substantial ROM along with the common finding and still rising number of FTI on 18 F-FDG PET/CT seem to justify further diagnostics. However, the general approach to continue to ultrasound guided FNAC might contribute to overdiagnostics and overtreatment of benign nodules and (small) PTC. Similarly, the accompanying undiagnostic or indeterminate findings on cytopathology might require repeat FNACs or surgeries in onefifth of patients. Additional undesirable consequences of this straightforward approach might be anxiety and interferences with definitive treatment planning, in particular in patients with other malignancies making up the main indication for 18 F-FDG-PET/CT. Moreover, the impact of diagnosing a thyroid malignancy on overall survival in patients with other malignancies is questionable, not to mention the significant health care costs of incidentally detected findings (5,73,85,86). Finally, the general recommendation ignores the importance of engaging patients in making decisions.
Given these points, the options of "inaction" or alternative action and active investigation according to current guidelines need to be explored evenly and the preferred option should be consistent with the patients' wishes and preferences. The clinical context needs to be weighed carefully on the possible scenarios after FNAC and the clinical impact of an incidental thyroid cancer or metastasis with regard to treatment options, risk of complications and adverse effects and prognosis. Patients who are more engaged in their health care decision making are more likely to experience confidence in treatment decisions, satisfaction with treatment, and trust in their providers (87). Our study showed a strong preselection of patients eligible for FNAC and surgery, indicating that further investigations were performed only if the results had impact on treatment algorithms. Similarly, two other studies demonstrated that 18 F-FDG PET/CT incidental findings could be managed appropriately in the clinical context and based on physician and patient decisions (88,89).
When aiming at allocating FTI for FNAC, ultrasound classification systems might be valuable. They have been developed in order to improve the uniformity of the interpretation of the sonographic patterns and the stratification of thyroid nodules for FNAC. These ultrasound-based tools have been validated in the general population of patients with nodular goiter and an estimated cancer prevalence of 2-3% (90). As shown in the present study, thyroid cancer prevalence among patients with FTI at 18 F-FDG PET/CT is significantly higher. Since the pretest risk of malignancy is hence higher for the latter the aim of the classification system will change likewise from saving unnecessary FNAC to detecting malignancy accurately. Four included studies aimed to assess the reliability of ultrasound classification systems in indicating FNAC and predicting malignancy in FTI on 18 F-FDG PET/CT (58,71,72,83). Three of them demonstrated, that the malignancy risk of FTI detected on 18 F-FDG PET/CT in the low suspicion categories did not show an increase in malignancy when compared with the estimated malignancy risks of these categories suggested by the guidelines (58,71,72). The FTI belonging to these categories accounted for 30-37% of the total. Conversely, in two of the studies FTI detected on 18 F-FDG PET/CT with intermediate to high suspicion showed an increase in malignancy in comparison with the estimated malignancy risks suggested by the guidelines (58,72). Furthermore, Trimboli et al. compared three ultrasound classifications in indicating FNAC in FTI and showed that all had a good performance, possibly reducing unnecessary FNACs in 25-53% of the total (83). Though subject to limitations with regard to study design these preliminary results show that the implementation of ultrasound classification systems might contribute to less unnecessary FNACs in the low suspicious nodules, whereas the indications for FNACs of the intermediate or high suspicious nodules might be more evidenced. Guidelines are concordant in recommending against routine FNAC of nodules smaller than 1 cm, even if they are highly suspicious on ultrasound (12)(13)(14)91).
Regarding the ROIF ultrasound classification systems might stratify thyroid nodules with BI, BIII and BIV. Guidelines recommend a repeat FNAC after a non-diagnostic initial FNAC. However, repeat US might be considered as well when initial European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules in Adults (EU-TIRADS) is 2 or 3 (92). In case of BIII and BIV clinical management is not that straightforward. Several studies evaluated the usefulness of the ultrasound classification systems in predicting malignancy of thyroid nodules with indeterminate cytology according to the Bethesda classification (93)(94)(95)(96)(97)(98). The varying results between the studies are affected by differences in sonographic patterns, cytologic diagnose and ROM. Even so, the US classifications confirm more or less a gradation in the pretest risk of malignancy. Therefore, it might be possible to guide management after an indeterminate cytological diagnosis based on US patterns. In other words, an intermediate or high suspicious ultrasound in a nodule with indeterminate cytology should trigger repeat FNAC or surgery, whereas a nodule with benign appearance may need clinical follow-up. Guidelines have not recommended this sonographic pattern stratification of nodules with indeterminate cytology and decision making should be made from a multidisciplinary perspective (14).
A new technique to manage indeterminate nodules could be the use of molecular markers. For example, BRAF mutation analysis could guide towards accurate surgical therapy. These molecular tests require standardization of performance characteristics and appropriate calibration as well as analytic validation before clinical interpretation (18,99). Therefore, the routine BRAF testing does not (yet) have a place in the clinical routine and is therefore not recommended (100).
Some considerations in the interpretation of the results of the present systematic review and meta-analysis should be mentioned. First, a threat to the validity of any meta-analysis is publication bias. Our analyses were not suggestive of publication bias.
Second, the prevalence of the included studies showed substantial heterogeneity. Only age was a significant discriminator with studies with a mean age younger than 60 years having a higher prevalence of FTI. This finding might seem surprising given the fact that the prevalence of thyroid nodules increases with age (101). However, at the same time the prevalence of malignant, and therefore FDG-avid, thyroid nodules decreases with age (102). The subgroups were not controlled for contributing factors, such as sex distribution, histopathology or cytopathology findings, clinical signs of thyroid malignancy or risk factors for developing thyroid cancer, hampering straightforward conclusions (103). Another contributing factor might have been the applied definition of focal increased uptake in the thyroid gland on 18 F-FDG PET/CT. Most studies used visual and semiquantitative assessments, which might be prone to nonreplicability and variability of results. Patient selection, 18 F-FDG PET/CT indication and geographic influences were of minor significance at subgroup analysis.
Third, a major limitation in calculating the ROM was the high degree of preselection of FTI for cyto-or histopathology and the different reference standards for defining malignancy. Although FNAC is valuable by facilitating the diagnostic correlation with histopathology, cytopathology is not considered the gold standard (104)(105)(106)(107). Nevertheless, in the present meta-analysis both cyto-and histopathology results were used equally for estimating the ROM. ROM was not calculated using only histopathology results, because most patients undergoing diagnostic surgery were preselected by FNAC. Follicular carcinomas, which are per definition not higher than Bethesda IV, were still included in analysis as Bethesda IV often led to diagnostic surgery.
Fourth, the ROM of the selected studies showed moderate heterogeneity. This might be caused by the retrospective design of most studies with higher risks of bias and non-replicability of methods and results. The visual assessment method for defining FTI on 18 F-FDG PET/CT might also have contributed as the degree of focal uptake of FTI might be of predictive though not of conclusive value for malignancy (10,34,37,41,45,49).
Finally, only one fourth of included studies were suitable for analysis of the ROIF. Pooling of data resulted in substantial heterogeneity. No sources of heterogeneity could be shown at subgroup analysis. Variability in ROIF might be accounted to the hospital setting (i.e. settings of local multidisciplinary guidelines and consultations and organization of patient flow pathways), the degree of experience of the radiologist performing the FNAC, the availability of a cytopathology technician for on-site assessment of the specimen adequacy and a pathologist for consulting a second-reading of the FNAC. The latter might be of decisive importance as intra-and interobserver variation exists in the distinction between BIII and BIV (108).
The present systematic review and meta-analysis shows that FTI are a common finding on 18 F-FDG PET/CT. Nodules are malignant in approximately one third of cases with the majority being PTC. At the same time, cytology results are non-diagnostic or indeterminate in one fifth of FNACs. Before proceeding to active examination of the FTI, the clinical context and the preferences of the patient should be reviewed and balanced with the possible scenarios after FNAC and the clinical impact of diagnosing PTC.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

FUNDING
The open access publication fee is funded by the Medical Imaging Center (MIC), which is a UMCG research facility. No other sources of funding were used.